Behind the Beats: A Feature Analysis of Songs by Spotify’s Fab Five

Authors

Brian Kwon

Jorge Bris Moreno

Aaron Schwall

Joe Zhang

Abstract

This project looks at specific audio features among the top five most streamed spotify artists of 2022. We used the spotify API to download information on seven audio features for each of the five most streamed artists: BTS, Bad Bunny, The Weeknd, Taylor Swift, and Drake. The features studied are: Accousticness, Danceability, Energy, Loudness, Speechiness, Tempo, and Valence. We performed exploratory data analysis to determine our hypothesis, and then performed three different hypothesis tests to determine which features were common among the artists. We performed T-Tests, Wilcoxon Ranked Sum tests, and bootstrapping tests. We found that the tempo was the same among all of the top five artists, but all of the other audio features were different for at least one artist.


Introduction

This project aims to compare various features of the top 5 most streamed artists on Spotify in 2022 (Spangler, n.d.). The goal is to identify which variables are statistically same or different among those artists, and eventually determine the optimal range of features in order to get more streams. While we cannot determine with these tests whether or not the features are deterministic on popularity, it will be of great use for comparison and analysis. Thus, if a feature does not differ on any of the artists, we can safely imply that there seems to be an optimal range on that feature if the feature itself is actually deterministic on the number of streams.

For this study, all features obtained from spotify API will be used except for instrumentalness and Liveness. Given that instrumentalness has a huge amount of zeros due to the nature of these artists, that is they are singers, this feature is out of the scope of this project. Furthermore, liveness focuses on whether the music was performed live or not. This feature also is not related to the number of streams.

The rest of the features and their meanings are as follows (Spotify, n.d.):

  • Acousticness: Measures from 0 to 1 whether a track is acoustic or not (0 being the lowest).
  • Danceability: Measures how suitable the music is for dancing (0 being the least suitable).
  • Energy: Measures the perception of intensity and activity of a song from 0 to 1 (0 being the lowest).
  • Loudness: Measures the loudness of tracks in decibels (from -60 to 0).
  • Speechiness: Measures the ratio of spoken words to music in a track from 0 to 1 (0 being the lowest).
  • Tempo: Measures the beats per minute (referring to the pace of the track and the average beat duration).
  • Valence: Measures the positiveness of a track from 0 to 1 (0 being the most negative value and 1 being the most positive value).

Our hypothesis testing is going to be pairwise comparing each feature on pairs of all the artists. Our null hypothesis for every feature will be that there is no difference on that specific feature among all pairs of the artists, while the alternative hypothesis will be that there is a statistically significant differnce between pairs of artists on that feature.

The hypotheses can be written as follows:

\(H_0: \forall f\in F [\forall a,b \in A, a \neq b [\mu_{f,a} - \mu_{f,b} = 0]]\)
\(H_A: \exists f\in F [\exists a,b \in A, a \neq b [\mu_{f,a} - \mu_{f,b} \neq 0]]\)

\(F\) = {\(\text{Acousticness, Danceability, Energy, Loudness, Speechiness, Tempo, Valence}\)}
\(A\) = {\(\text{Taylor Swift, BTS, Bad Bunny, Drake, The Weeknd}\)}

Data Collection

In order to access and contrast the audio features of the artists who are most streamed on Spotify in 2022, we first created a Spotify Developer account. With this account, we can access Spotify’s API, which is an effective tool for getting different types of data from Spotify. You can access or retrieve information about artists, tracks, and audio features by using get_artist_audio_features( ) function offered by the Spotify API. The get_artist_audio_features() function appears to be especially pertinent for our needs. We can use this function to find out specific artists’ audio feature details in detail. These audio features, which provide insights into the musical qualities that make these artists popular, may include elements like tempo, loudness, energy, danceability, and such.

Data Preparation
library(spotifyr)
library(ggplot2)
library(ggcorrplot)
library(viridis)
library(plotly, quietly = TRUE)

Sys.setenv(SPOTIFY_CLIENT_ID = "8a556a04a8634ac99c1c317c336cdcaa")
Sys.setenv(SPOTIFY_CLIENT_SECRET = "2c681e6bf45b450396d1e5d2487e5f71")

access_token = get_spotify_access_token()

Taylor_Swift = get_artist_audio_features("Taylor Swift")
BTS = get_artist_audio_features("BTS")
Bad_Bunny = get_artist_audio_features("Bad Bunny")
Drake = get_artist_audio_features("Drake")
The_weeknd = get_artist_audio_features("The Weeknd")

ts = data.frame(Taylor_Swift$artist_name,Taylor_Swift$acousticness,
                Taylor_Swift$danceability,Taylor_Swift$energy,
                Taylor_Swift$instrumentalness,Taylor_Swift$liveness,
                Taylor_Swift$loudness,Taylor_Swift$speechiness,
                Taylor_Swift$tempo, Taylor_Swift$valence,
                Taylor_Swift$track_name, Taylor_Swift$album_name,Taylor_Swift$album_release_year)
colnames(ts) = c("Artist_name","Acousticness","Danceability","Energy", "Instrumentalness",
                              "Liveness","Loudness","Speechiness",
                              "Tempo","Valence","Track_name","Album_name","Album_release_year")

bts = data.frame(BTS$artist_name,BTS$acousticness,
                BTS$danceability,BTS$energy,
                BTS$instrumentalness,BTS$liveness,
                BTS$loudness,BTS$speechiness,
                BTS$tempo, BTS$valence,
                BTS$track_name, BTS$album_name,BTS$album_release_year)
colnames(bts) = c("Artist_name","Acousticness","Danceability","Energy", "Instrumentalness",
                              "Liveness","Loudness","Speechiness",
                              "Tempo","Valence","Track_name","Album_name","Album_release_year")

bb = data.frame(Bad_Bunny$artist_name,Bad_Bunny$acousticness,
                Bad_Bunny$danceability,Bad_Bunny$energy,
                Bad_Bunny$instrumentalness,Bad_Bunny$liveness,
                Bad_Bunny$loudness,Bad_Bunny$speechiness,
                Bad_Bunny$tempo, Bad_Bunny$valence,
                Bad_Bunny$track_name, Bad_Bunny$album_name,Bad_Bunny$album_release_year)
colnames(bb) = c("Artist_name","Acousticness","Danceability","Energy", "Instrumentalness",
                              "Liveness","Loudness","Speechiness",
                              "Tempo","Valence","Track_name","Album_name","Album_release_year")

dk = data.frame(Drake$artist_name,Drake$acousticness,
                Drake$danceability,Drake$energy,
                Drake$instrumentalness,Drake$liveness,
                Drake$loudness,Drake$speechiness,
                Drake$tempo, Drake$valence,
                Drake$track_name, Drake$album_name,Drake$album_release_year)
colnames(dk) = c("Artist_name","Acousticness","Danceability","Energy", "Instrumentalness",
                              "Liveness","Loudness","Speechiness",
                              "Tempo","Valence","Track_name","Album_name","Album_release_year")

wd = data.frame(The_weeknd$artist_name,The_weeknd$acousticness,
                The_weeknd$danceability,The_weeknd$energy,
                The_weeknd$instrumentalness,The_weeknd$liveness,
                The_weeknd$loudness,The_weeknd$speechiness,
                The_weeknd$tempo, The_weeknd$valence,
                The_weeknd$track_name, The_weeknd$album_name,The_weeknd$album_release_year)
colnames(wd) = c("Artist_name","Acousticness","Danceability","Energy", "Instrumentalness",
                              "Liveness","Loudness","Speechiness",
                              "Tempo","Valence","Track_name","Album_name","Album_release_year")

df = rbind(ts, bts, bb, dk, wd)
head(df)
   Artist_name Acousticness Danceability Energy Instrumentalness Liveness
1 Taylor Swift     0.009420        0.757  0.610         3.66e-05   0.3670
2 Taylor Swift     0.088500        0.733  0.733         0.00e+00   0.1680
3 Taylor Swift     0.000421        0.511  0.822         1.97e-02   0.0899
4 Taylor Swift     0.000537        0.545  0.885         5.59e-05   0.3850
5 Taylor Swift     0.000656        0.588  0.721         0.00e+00   0.1310
6 Taylor Swift     0.012100        0.636  0.808         2.18e-05   0.3590
  Loudness Speechiness   Tempo Valence
1   -4.840      0.0327 116.998   0.685
2   -5.376      0.0670  96.057   0.701
3   -4.785      0.0397  94.868   0.305
4   -5.968      0.0447  92.021   0.206
5   -5.579      0.0317  96.997   0.520
6   -5.693      0.0729 160.058   0.917
                                     Track_name
1        Welcome To New York (Taylor's Version)
2                Blank Space (Taylor's Version)
3                      Style (Taylor's Version)
4           Out Of The Woods (Taylor's Version)
5 All You Had To Do Was Stay (Taylor's Version)
6               Shake It Off (Taylor's Version)
                        Album_name Album_release_year
1 1989 (Taylor's Version) [Deluxe]               2023
2 1989 (Taylor's Version) [Deluxe]               2023
3 1989 (Taylor's Version) [Deluxe]               2023
4 1989 (Taylor's Version) [Deluxe]               2023
5 1989 (Taylor's Version) [Deluxe]               2023
6 1989 (Taylor's Version) [Deluxe]               2023
Checking NAs
cat("There is", sum(is.na(df)),"NA values.")
There is 0 NA values.

Tests

T-Test

The t-test is used to compare means to determine if there is a significant difference. The t-test is based on the t-statistic, which is the difference of the means of two populations or samples and dividing it by the standard error of the difference. We used a pairwise t-test in order to test if the means of the specific features were different between the different pairs of artists. The t-test returns a matrix of p-values calculated from the t-statistics for each artist pair. Each p-value refers to the probability of observing a t-statistic as extreme or more extreme than the one obtained. We used a 95% confidence interval. This means that if the p-value was above 0.05\(\%\), we rejected the null hypothesis and accepted the alternative hypothesis for that artist pairing.

Wilcoxon Rank Sum Test

Wilcoxon test is a non-parametric statistical test used to find whether there is a statistically significant difference between a pair of means. As a non-parametric test, it is great as it does not assume equal variance between the two populations. How it works is by calculating the difference between each pair of data points (one from each population or sample) and ranking the differences in absolute value from largest to smallest. Then, we are calculating the sum of the ranks for each and the Wilcoxon test statistic is the smallest of the two sum ranks (in absolute value). Additionally, a p-value is calculated which determines the probability of obtaining a test statistic as extreme or more extreme than the one obtained, and this is the value returned in our hypothesis tests. For this test, we have used a 95\(\%\) confidence level, which means that if the p-value obtained is less than 0.05, we reject the null hypothesis while if it is greater than 0.05, we fail to reject our null hypothesis.

Bootstrap Test

The bootstrapping difference in means is a useful hypothesis test to compare whether or not two means statistically differ from each other. Each population/sample is resampled n times with n samples in each resample. Then, the mean of each is calculated and subtracted from the other resample of the comparing population/sample. Then, a new distribution is obtained (normally distributed) of the difference in means. Then, a confidence interval is calculated at your confidence level (95\(\%\) in our case) and, if the interval contains the value of 0, we will reject the null as we find statistical significance on the means of both populations. If the interval is positive, that means that the first mean is statistically significantly greater from the one subtracted while, if negative, the first mean is statistically significantly lower from the one subtracted. However, if the confidence interval contains 0, we fail to reject at our confidence level and cannot state that there is a difference between the means of both populations.

EDA

The objective of Exploratory Data Analysis is to give us a better understanding of our data sets by obtaining information about the data’s range, characteristics, correlations, patterns, and visual outliers. This step is crucial for making the right assumptions, cross checking results, and making the right conclusions.

The plots below provide us with relevant information about each of the seven features based on each artist. While no final conclusions can be drawn from these, they will allow us to visualize the data and better understand it.

Code
cat("Taylor Swift has",nrow(ts),"tracks.","\n")
Taylor Swift has 530 tracks. 
Code
cat("BTS has",nrow(bts),"tracks.","\n")
BTS has 294 tracks. 
Code
cat("Bad Bunny has",nrow(bb),"tracks.","\n")
Bad Bunny has 113 tracks. 
Code
cat("Drake has",nrow(dk),"tracks.","\n")
Drake has 308 tracks. 
Code
cat("The Weeknd has",nrow(wd),"tracks.","\n")
The Weeknd has 252 tracks. 

The above data frame row numbers give us the number of tracks published by each artist on spotify. Taylor swift unfortunately has the largest catalog, with 530 individual tracks. Bad Bunny has the smallest catalog of the top five streamed artists, with only 113 individual tracks. To help put those numbers into perspective, consider that the average EP is 4 to 5 songs long, and the average album is 10-12 songs long. Even though Bad Bunny has the smallest catalog of the top five, he still has far more songs then the average spotify artist. It makes sense that the most streamed artists would be ones that have a significant number of tracks for listeners to play.

Code
mx = cor(df[,c(2,3,4,5,6,7,8,9,10)])
ggcorrplot(mx, hc.order = TRUE, type = "lower", lab=TRUE, title = "Correlation between features")

Above is a correlation heatmap between all of the initially collected features. After preliminary EDA, we determined that the features Instrumentallness and Liveness would be excluded from future EDA, because due to the nature of the top five artists they provide very little useful information. In the heatmap above, we can see that the highest correlation is between energy and loudness. This makes sense, as we can expect louder songs to feel like they have more energy. We can also see significant positive correlation between energy and valence, telling us that high energy songs are probably more likely to be positive sounding. There is a significant visible negative correlation between loudness and acousticness, as well as energy and acousticness. This also makes sense, as acoustic songs tend to sound slower and lower energy then non acoustic ones. One of the more interesting findings from this plot is that both tempo and speechiness don’t have a strong correlation with any other features.

Code
ggplot(df, aes_string(x = "Artist_name", y = "Speechiness", fill = "Artist_name")) +
geom_boxplot() + ggtitle("Speechiness Comparison") + xlab("Artists") + ylab("Speechiness") +
scale_fill_manual(values = c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) + theme_minimal()

Code
ggplot(df, aes_string(x = "Artist_name", y = "Loudness", fill = "Artist_name")) +
geom_boxplot() + ggtitle("Loudness Comparison") + xlab("Artists") + ylab("Loudness") +
scale_fill_manual(values = c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) + theme_minimal()

Above are box plots for the features speechiness and loudness. Speechiness describes the amount of spoken word in a Spotify track. We can see that all of the top five artists have relatively low speechiness scores, with Taylor Swift’s average and interquartile range being by far the smallest. Drake has the largest average and interquartile range. This is likely due to the genre of the artists, as Taylor Swift’s songs are almost all pop or country pop, while Drake’s music is closer to rap/hip-hop. In our loudness box plot, we can see all of the artists have relatively high loudness scores. BTS has the highest loudness score, followed by Bad Bunny. Drake, Taylor Swift, and the Weeknd all have relatively similar loudness scores. This is likely also due to the artist’s genre. BTS is a K-POP group, while Bad BUnny is Reggaeton.

Code
ggplot(df, aes_string(x = "Artist_name", y = "Instrumentalness", fill = "Artist_name")) +
                geom_boxplot() + ggtitle("Instrumentalness Comparison") + xlab("Artists") + ylab("Instrumentalness") +
                scale_fill_manual(values = c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) + theme_minimal()

Above is the box plot for instrumentalness. Instrumentalness describes if a track is instrumental as opposed to vocal. We chose to not include this feature in our analysis because of the fact that almost all of the values are 0. This is expected, because all of the top five artists are singers.

Code
ggplot(df, aes(x = Energy, fill = Artist_name)) +
geom_density(alpha = 0.5) +
labs(title = "Energy Comparison", x = "Energy", y = "Density") +
scale_fill_manual(values = c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +
theme_minimal()

Code
ggplot(df, aes(x = Tempo, fill = Artist_name)) +
geom_density(alpha = 0.5) +
labs(title = "Tempo Comparison", x = "Tempo", y = "Density") +
scale_fill_manual(values = c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +
theme_minimal()

Above are density plots for the features energy and tempo. This type of plot allows us to view the distributions of the features for each artist on top of each other. When looking at the energy densities, we can see right away that BTS has a much higher energy distribution than the other artists. They also have a lower standard deviation than the other distributions. BAd bunny has the second highest energy distribution. Drake has the lowest energy distribution. One interesting thing to note is that Taylor Swift appears to have the highest standard deviation among the top five artists.

Code
ggplot(data = df, aes(x = Artist_name, y = Danceability, fill = Artist_name)) +
    geom_violin(width = 0.5) +
    geom_boxplot(width = 0.9, color = "grey", alpha = 0.2) +
    scale_fill_viridis(discrete = TRUE) +
    labs(title = "Danceability Comparison", x = "Artists", y = "Danceability")

Code
ggplot(data = df, aes(x = Artist_name, y = Acousticness, fill = Artist_name)) +
    geom_violin(width = 0.5) +
    geom_boxplot(width = 0.9, color = "grey", alpha = 0.2) +
    scale_fill_viridis(discrete = TRUE) +
    labs(title = "Acousticness Comparison", x = "Artists", y = "Acousticness")

Code
ggplot(data = df, aes(x = Artist_name, y = Valence, fill = Artist_name)) +
    geom_violin(width = 0.5) +
    geom_boxplot(width = 0.9, color = "grey", alpha = 0.2) +
    scale_fill_viridis(discrete = TRUE) +
    labs(title = "Valence Comparison", x = "Artists", y = "Valence")

Above are density plots for the features energy and tempo. This type of plot allows us to view the distributions of the features for each artist on top of each other. When looking at the energy densities, we can see right away that BTS has a much higher energy distribution than the other artists. They also have a lower standard deviation than the other distributions. BAd bunny has the second highest energy distribution. Drake has the lowest energy distribution. One interesting thing to note is that Taylor Swift appears to have the highest standard deviation among the top five artists.

Code
ggplot(data = df, aes(x = Artist_name, y = Acousticness, color = Artist_name)) +
  geom_jitter(alpha = 0.7, width = 0.3) +
  geom_boxplot(width = 0.2, outlier.shape = NA, coef = 0) +
  labs(title = "Acousticness Comparison",
       x = "Artist",
       y = "Acousticness") +
  scale_color_manual(values = c("lightblue1", "pink1", "burlywood1", "turquoise", "purple3")) +
  theme_minimal()

Above is a density box plot for acousticness. Again, we can see that BTS has a much lower Acousticness score and a significantly narrower interquartile range. We can see again that Taylor Swift has a much wider inter quartile range than any of the other top five artists. Another interesting thing we can take away from this plot is the visual representation of the scale of the artists catalogs. The density of Taylor Swift’s plot is much greater than that of artists with smaller catalogs like BTS and Bad Bunny.

Code
plot_ly(df, x = ~Energy, y = ~Loudness, z = ~Artist_name, type = 'scatter3d', color = ~Artist_name, mode = 'markers')

Above is the 3D plot on loudness and energy between artists. As we can see on the correlation heatmap, we can see there is a positive correlation between energy and loudness on all artists.

Code
plot_ly(df, x = ~Loudness, y = ~Acousticness, z = ~Artist_name, type = 'scatter3d', color = ~Artist_name, mode = 'markers')

Having the correlation coefficient of -0.56, the 3D plot doesn’t show the a strong correlation visually.

Hypothesis Testing

Functions for bootstrap tests
# Function to bootstrap mean difference of two samples for the test
boot.test = function(x1,x2,feature,iter){
    boot1 = numeric(length(x1[[feature]]))
    boot2 = numeric(length(x2[[feature]]))
    for(i in 1:iter){
        boot1[i] = mean(sample(x1[[feature]],length(x1[[feature]]),replace=TRUE))
        boot2[i] = mean(sample(x2[[feature]],length(x2[[feature]]),replace=TRUE))
    }
    return(boot1 - boot2)
}

# Function to check if 0 is within the range
contains_zero <- function(quantiles) {
  return (quantiles[1] <= 0 && quantiles[2] >= 0)
}

These two functions will make easier to do bootstrap tests. Function boot.test returns mean difference between two bootstrap samples. From that mean difference, we can use contains_zero function to check whether the confidence interval includes zero or not in a certain level.

Acousticness

T-Test
pairwise.t.test(df$Acousticness, df$Artist_name)

    Pairwise comparisons using t tests with pooled SD 

data:  df$Acousticness and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          0.00287   -       -       -           
Drake        1.00000   7.3e-06 -       -           
Taylor Swift 0.01984   < 2e-16 0.00053 -           
The Weeknd   1.00000   2.3e-07 1.00000 0.02917     

P value adjustment method: holm 

From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of acousticness between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Taylor Swift
  • Drake and Taylor Swift
  • Taylor Swift and The Weeknd

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of acousticness.

  • Bad Bunny and Drake
  • Bad Bunny and The Weeknd
  • Drake and The Weekend
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Acousticness, df$Artist_name)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  df$Acousticness and df$Artist_name 

             Bad Bunny BTS     Drake Taylor Swift
BTS          7.2e-11   -       -     -           
Drake        0.67      1.8e-10 -     -           
Taylor Swift 1.00      6.4e-15 0.29  -           
The Weeknd   1.00      1.7e-09 1.00  0.67        

P value adjustment method: holm 

From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of acousticness between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of acousticness.

  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weeknd
  • Drake and Taylor Swift
  • Drake and The Weekend
  • Taylor Swift and The Weeknd
Bootstrap Test
c1 = quantile(boot.test(ts,bb,"Acousticness",1000),c(0.025,0.975))
c2 = quantile(boot.test(ts,bts,"Acousticness",1000),c(0.025,0.975))
c3 = quantile(boot.test(ts,dk,"Acousticness",1000),c(0.025,0.975))
c4 = quantile(boot.test(ts,wd,"Acousticness",1000),c(0.025,0.975))
c5 = quantile(boot.test(bts,bb,"Acousticness",1000),c(0.025,0.975))
c6 = quantile(boot.test(bts,dk,"Acousticness",1000),c(0.025,0.975))
c7 = quantile(boot.test(bts,wd,"Acousticness",1000),c(0.025,0.975))
c8 = quantile(boot.test(bb,dk,"Acousticness",1000),c(0.025,0.975))
c9 = quantile(boot.test(bb,wd,"Acousticness",1000),c(0.025,0.975))
c10 = quantile(boot.test(dk,wd,"Acousticness",1000),c(0.025,0.975))

result_matrix = matrix(c(
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
), ncol = 2, byrow = TRUE)

result_matrix = data.frame(result_matrix)
include_zero = apply(result_matrix, 1, contains_zero)
result_matrix["Test"] = include_zero
colnames(result_matrix) = c("2.5%","97.5%","Test")
result_matrix
          2.5%       97.5%  Test
1   0.03145473  0.12580009 FALSE
2   0.15142479  0.22677615 FALSE
3   0.03919869  0.11750676 FALSE
4   0.01323335  0.09662670 FALSE
5  -0.15103426 -0.06475363 FALSE
6  -0.14570151 -0.07434646 FALSE
7  -0.17238574 -0.09292316 FALSE
8  -0.05390404  0.04314970  TRUE
9  -0.07861085  0.02382188  TRUE
10 -0.06463168  0.01776919  TRUE

From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of acousticness between the following pairs of artists:

  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and BTS
  • Taylor Swift and The Weeknd
  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and The Weeknd

Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of acousticness.

  • Bad Bunny and Drake
  • Bad Bunny and The Weeknd
  • Drake and The Weeknd

Danceability

T-Test
pairwise.t.test(df$Danceability, df$Artist_name)

    Pairwise comparisons using t tests with pooled SD 

data:  df$Danceability and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          < 2e-16   -       -       -           
Drake        3.3e-11   0.0041  -       -           
Taylor Swift < 2e-16   0.0082  2.8e-09 -           
The Weeknd   < 2e-16   1.2e-12 < 2e-16 2.4e-08     

P value adjustment method: holm 

From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of danceability between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and The Weeknd
  • Drake and Bad Bunny
  • Drake and The Weeknd
  • Bad Bunny and the Weeknd

Thus, all pairs hold significant difference.

Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Danceability, df$Artist_name)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  df$Danceability and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          < 2e-16   -       -       -           
Drake        3.4e-09   0.0089  -       -           
Taylor Swift < 2e-16   0.0089  1.8e-07 -           
The Weeknd   < 2e-16   2.1e-06 3.8e-12 0.0024      

P value adjustment method: holm 

From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of danceability between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and The Weeknd
  • Drake and Bad Bunny
  • Drake and The Weeknd
  • Bad Bunny and the Weeknd

Thus, all pairs hold significant difference.

Bootstrap Test
c1 = quantile(boot.test(ts,bb,"Danceability",1000),c(0.025,0.975))
c2 = quantile(boot.test(ts,bts,"Danceability",1000),c(0.025,0.975))
c3 = quantile(boot.test(ts,dk,"Danceability",1000),c(0.025,0.975))
c4 = quantile(boot.test(ts,wd,"Danceability",1000),c(0.025,0.975))
c5 = quantile(boot.test(bts,bb,"Danceability",1000),c(0.025,0.975))
c6 = quantile(boot.test(bts,dk,"Danceability",1000),c(0.025,0.975))
c7 = quantile(boot.test(bts,wd,"Danceability",1000),c(0.025,0.975))
c8 = quantile(boot.test(bb,dk,"Danceability",1000),c(0.025,0.975))
c9 = quantile(boot.test(bb,wd,"Danceability",1000),c(0.025,0.975))
c10 = quantile(boot.test(dk,wd,"Danceability",1000),c(0.025,0.975))

result_matrix = matrix(c(
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
), ncol = 2, byrow = TRUE)

result_matrix = data.frame(result_matrix)
include_zero = apply(result_matrix, 1, contains_zero)
result_matrix["Test"] = include_zero
colnames(result_matrix) = c("2.5%","97.5%","Test")
result_matrix
          2.5%       97.5%  Test
1  -0.18312507 -0.13963247 FALSE
2  -0.04335932 -0.01053287 FALSE
3  -0.07815399 -0.04174049 FALSE
4   0.03749782  0.08341862 FALSE
5  -0.16304285 -0.11365618 FALSE
6  -0.05548508 -0.01453066 FALSE
7   0.06023294  0.11069027 FALSE
8   0.07626182  0.12827765 FALSE
9   0.19301882  0.25400881 FALSE
10  0.09369581  0.14718095 FALSE

From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of danceability between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and The Weeknd
  • Drake and Bad Bunny
  • Drake and The Weeknd
  • Bad Bunny and the Weeknd

Thus, all pairs hold significant difference.

Energy

T-Test
pairwise.t.test(df$Energy, df$Artist_name)

    Pairwise comparisons using t tests with pooled SD 

data:  df$Energy and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          9.9e-08   -       -       -           
Drake        8.3e-09   < 2e-16 -       -           
Taylor Swift 3.4e-06   < 2e-16 0.05909 -           
The Weeknd   0.00014   < 2e-16 0.05190 0.54488     

P value adjustment method: holm 

From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of energy between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weekend

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of energy.

  • Drake and Taylor Swift
  • Drake and The Weeknd
  • Taylor Swift and The Weeknd
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Energy, df$Artist_name)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  df$Energy and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          1.3e-14   -       -       -           
Drake        7.7e-11   < 2e-16 -       -           
Taylor Swift 3.3e-05   < 2e-16 0.04077 -           
The Weeknd   0.00013   < 2e-16 0.00448 0.49284     

P value adjustment method: holm 

From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of energy between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weeknd
  • Drake and Taylor Swift
  • Drake and The Weekend

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of energy.

  • Taylor Swift and The Weeknd
Bootstrap Test
c1 = quantile(boot.test(ts,bb,"Energy",1000),c(0.025,0.975))
c2 = quantile(boot.test(ts,bts,"Energy",1000),c(0.025,0.975))
c3 = quantile(boot.test(ts,dk,"Energy",1000),c(0.025,0.975))
c4 = quantile(boot.test(ts,wd,"Energy",1000),c(0.025,0.975))
c5 = quantile(boot.test(bts,bb,"Energy",1000),c(0.025,0.975))
c6 = quantile(boot.test(bts,dk,"Energy",1000),c(0.025,0.975))
c7 = quantile(boot.test(bts,wd,"Energy",1000),c(0.025,0.975))
c8 = quantile(boot.test(bb,dk,"Energy",1000),c(0.025,0.975))
c9 = quantile(boot.test(bb,wd,"Energy",1000),c(0.025,0.975))
c10 = quantile(boot.test(dk,wd,"Energy",1000),c(0.025,0.975))

result_matrix = matrix(c(
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
), ncol = 2, byrow = TRUE)

result_matrix = data.frame(result_matrix)
include_zero = apply(result_matrix, 1, contains_zero)
result_matrix["Test"] = include_zero
colnames(result_matrix) = c("2.5%","97.5%","Test")
result_matrix
           2.5%        97.5%  Test
1  -0.116644546 -0.064769176 FALSE
2  -0.226040327 -0.177643880 FALSE
3  -0.000763686  0.052240172  TRUE
4  -0.036032173  0.021274537  TRUE
5   0.081022973  0.136264820 FALSE
6   0.200198673  0.253432931 FALSE
7   0.164795943  0.221146223 FALSE
8   0.090670015  0.147893217 FALSE
9   0.053414202  0.112245306 FALSE
10 -0.063890736 -0.005496979 FALSE

From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of energy between the following pairs of artists:

  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and BTS
  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and The Weeknd
  • Drake and The Weeknd

Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of energy.

  • Taylor Swift and The Weeknd

Loudness

T-Test
pairwise.t.test(df$Loudness, df$Artist_name)

    Pairwise comparisons using t tests with pooled SD 

data:  df$Loudness and df$Artist_name 

             Bad Bunny BTS     Drake  Taylor Swift
BTS          0.0097    -       -      -           
Drake        2.0e-11   < 2e-16 -      -           
Taylor Swift 1.3e-06   < 2e-16 0.0030 -           
The Weeknd   1.9e-14   < 2e-16 0.1199 6.6e-06     

P value adjustment method: holm 

From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of loudness between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weekend
  • Drake and Taylor Swift
  • Taylor Swift and The Weeknd

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of loudness.

  • Drake and The Weeknd
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Loudness, df$Artist_name)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  df$Loudness and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          1.7e-09   -       -       -           
Drake        < 2e-16   < 2e-16 -       -           
Taylor Swift 1.3e-07   < 2e-16 0.00048 -           
The Weeknd   < 2e-16   < 2e-16 0.63434 0.00088     

P value adjustment method: holm 

From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of loudness between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weekend
  • Drake and Taylor Swift
  • Taylor Swift and The Weeknd

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of loudness.

  • Drake and The Weeknd
Bootstrap Test
c1 = quantile(boot.test(ts,bb,"Loudness",1000),c(0.025,0.975))
c2 = quantile(boot.test(ts,bts,"Loudness",1000),c(0.025,0.975))
c3 = quantile(boot.test(ts,dk,"Loudness",1000),c(0.025,0.975))
c4 = quantile(boot.test(ts,wd,"Loudness",1000),c(0.025,0.975))
c5 = quantile(boot.test(bts,bb,"Loudness",1000),c(0.025,0.975))
c6 = quantile(boot.test(bts,dk,"Loudness",1000),c(0.025,0.975))
c7 = quantile(boot.test(bts,wd,"Loudness",1000),c(0.025,0.975))
c8 = quantile(boot.test(bb,dk,"Loudness",1000),c(0.025,0.975))
c9 = quantile(boot.test(bb,wd,"Loudness",1000),c(0.025,0.975))
c10 = quantile(boot.test(dk,wd,"Loudness",1000),c(0.025,0.975))

result_matrix = matrix(c(
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
), ncol = 2, byrow = TRUE)

result_matrix = data.frame(result_matrix)
include_zero = apply(result_matrix, 1, contains_zero)
result_matrix["Test"] = include_zero
colnames(result_matrix) = c("2.5%","97.5%","Test")
result_matrix
         2.5%      97.5%  Test
1  -2.0058209 -1.2103081 FALSE
2  -2.9071680 -2.1421337 FALSE
3   0.3154683  1.1037719 FALSE
4   0.5642941  1.6810703 FALSE
5   0.5125664  1.3751731 FALSE
6   2.8333774  3.6895670 FALSE
7   3.1097462  4.2624262 FALSE
8   1.8730466  2.7376358 FALSE
9   2.1545412  3.3367014 FALSE
10 -0.1657850  0.9915159  TRUE

From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of loudness between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weekend
  • Drake and Taylor Swift
  • Taylor Swift and The Weeknd

Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of loudness.

  • Drake and The Weeknd

Speechiness

T-Test
pairwise.t.test(df$Speechiness, df$Artist_name)

    Pairwise comparisons using t tests with pooled SD 

data:  df$Speechiness and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          0.036     -       -       -           
Drake        2.3e-10   3.7e-08 -       -           
Taylor Swift 8.3e-10   < 2e-16 < 2e-16 -           
The Weeknd   1.1e-05   < 2e-16 < 2e-16 0.067       

P value adjustment method: holm 

From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of speechiness between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weeknd
  • Drake and Taylor Swift
  • Drake and The Weeknd

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of speechiness.

  • Taylor Swift and The Weeknd
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Speechiness, df$Artist_name)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  df$Speechiness and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          0.18      -       -       -           
Drake        2.2e-07   5.2e-08 -       -           
Taylor Swift < 2e-16   < 2e-16 < 2e-16 -           
The Weeknd   9.5e-11   < 2e-16 < 2e-16 4.4e-15     

P value adjustment method: holm 

From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of speechiness between the following pairs of artists:

– BTS and Drake - BTS and Taylor Swift - BTS and The Weeknd - Bad Bunny and Drake - Bad Bunny and Taylor Swift - Bad Bunny and The Weeknd - Drake and Taylor Swift - Drake and The Weeknd - Taylor Swift and The Weeknd

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of speechiness.

  • BTS and Bad Bunny
Bootstrap Test
c1 = quantile(boot.test(ts,bb,"Speechiness",1000),c(0.025,0.975))
c2 = quantile(boot.test(ts,bts,"Speechiness",1000),c(0.025,0.975))
c3 = quantile(boot.test(ts,dk,"Speechiness",1000),c(0.025,0.975))
c4 = quantile(boot.test(ts,wd,"Speechiness",1000),c(0.025,0.975))
c5 = quantile(boot.test(bts,bb,"Speechiness",1000),c(0.025,0.975))
c6 = quantile(boot.test(bts,dk,"Speechiness",1000),c(0.025,0.975))
c7 = quantile(boot.test(bts,wd,"Speechiness",1000),c(0.025,0.975))
c8 = quantile(boot.test(bb,dk,"Speechiness",1000),c(0.025,0.975))
c9 = quantile(boot.test(bb,wd,"Speechiness",1000),c(0.025,0.975))
c10 = quantile(boot.test(dk,wd,"Speechiness",1000),c(0.025,0.975))

result_matrix = matrix(c(
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
), ncol = 2, byrow = TRUE)

result_matrix = data.frame(result_matrix)
include_zero = apply(result_matrix, 1, contains_zero)
result_matrix["Test"] = include_zero
colnames(result_matrix) = c("2.5%","97.5%","Test")
result_matrix
          2.5%        97.5%  Test
1  -0.09588038 -0.053941454 FALSE
2  -0.12313150 -0.084770263 FALSE
3  -0.17459153 -0.139219394 FALSE
4  -0.02439835 -0.006545484 FALSE
5   0.00124871  0.054832145 FALSE
6  -0.07758370 -0.027889735 FALSE
7   0.06917586  0.108420265 FALSE
8  -0.10833284 -0.056495810 FALSE
9   0.03853266  0.080576546 FALSE
10  0.12261839  0.159687318 FALSE

From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of speechiness between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and The Weeknd
  • Drake and Bad Bunny
  • Drake and The Weeknd
  • Bad Bunny and the Weeknd

Thus, all pairs hold significant difference.

Tempo

T-Test
pairwise.t.test(df$Tempo, df$Artist_name)

    Pairwise comparisons using t tests with pooled SD 

data:  df$Tempo and df$Artist_name 

             Bad Bunny BTS Drake Taylor Swift
BTS          1         -   -     -           
Drake        1         1   -     -           
Taylor Swift 1         1   1     -           
The Weeknd   1         1   1     1           

P value adjustment method: holm 

From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is not a significant difference in the means of tempo between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and The Weeknd
  • Drake and Bad Bunny
  • Drake and The Weeknd
  • Bad Bunny and the Weeknd

Thus, we fail to reject that any pair of artists hold a significant difference.

Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Tempo, df$Artist_name)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  df$Tempo and df$Artist_name 

             Bad Bunny BTS Drake Taylor Swift
BTS          1         -   -     -           
Drake        1         1   -     -           
Taylor Swift 1         1   1     -           
The Weeknd   1         1   1     1           

P value adjustment method: holm 

From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is not a significant difference in the means of tempo between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and The Weeknd
  • Drake and Bad Bunny
  • Drake and The Weeknd
  • Bad Bunny and the Weeknd

Thus, we fail to reject that any pair of artists hold a significant difference.

Bootstrap Test
c1 = quantile(boot.test(ts,bb,"Tempo",1000),c(0.025,0.975))
c2 = quantile(boot.test(ts,bts,"Tempo",1000),c(0.025,0.975))
c3 = quantile(boot.test(ts,dk,"Tempo",1000),c(0.025,0.975))
c4 = quantile(boot.test(ts,wd,"Tempo",1000),c(0.025,0.975))
c5 = quantile(boot.test(bts,bb,"Tempo",1000),c(0.025,0.975))
c6 = quantile(boot.test(bts,dk,"Tempo",1000),c(0.025,0.975))
c7 = quantile(boot.test(bts,wd,"Tempo",1000),c(0.025,0.975))
c8 = quantile(boot.test(bb,dk,"Tempo",1000),c(0.025,0.975))
c9 = quantile(boot.test(bb,wd,"Tempo",1000),c(0.025,0.975))
c10 = quantile(boot.test(dk,wd,"Tempo",1000),c(0.025,0.975))

result_matrix = matrix(c(
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
), ncol = 2, byrow = TRUE)

result_matrix = data.frame(result_matrix)
include_zero = apply(result_matrix, 1, contains_zero)
result_matrix["Test"] = include_zero
colnames(result_matrix) = c("2.5%","97.5%","Test")
result_matrix
         2.5%    97.5% Test
1  -5.2202930 7.770394 TRUE
2  -2.6147977 5.500449 TRUE
3  -0.7089945 7.649112 TRUE
4  -1.9208334 6.512297 TRUE
5  -6.7612698 6.169327 TRUE
6  -3.0137000 6.329788 TRUE
7  -4.0530618 5.531469 TRUE
8  -4.9221611 8.983260 TRUE
9  -5.7500397 7.536206 TRUE
10 -6.0077789 3.635489 TRUE

From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is not a significant difference in the means of tempo between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and The Weeknd
  • Drake and Bad Bunny
  • Drake and The Weeknd
  • Bad Bunny and the Weeknd

Thus, we fail to reject that any pair of artists hold a significant difference.

Valence

T-Test
pairwise.t.test(df$Valence, df$Artist_name)

    Pairwise comparisons using t tests with pooled SD 

data:  df$Valence and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          0.03659   -       -       -           
Drake        4.9e-09   < 2e-16 -       -           
Taylor Swift 0.00011   < 2e-16 0.00193 -           
The Weeknd   2.3e-12   < 2e-16 0.07401 1.2e-06     

P value adjustment method: holm 

From this pairwise T-Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of valence between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Taylor Swift
  • Bad Bunny and Drake
  • Bad Bunny and The Weeknd
  • Drake and Taylor Swift
  • Taylor Swift and The Weeknd

Other pairs below have p-value higher than 0.05, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of valence.

  • Drake and The Weekend
Wilcoxon Rank Sum Test
pairwise.wilcox.test(df$Valence, df$Artist_name)

    Pairwise comparisons using Wilcoxon rank sum test with continuity correction 

data:  df$Valence and df$Artist_name 

             Bad Bunny BTS     Drake   Taylor Swift
BTS          0.03206   -       -       -           
Drake        1.2e-07   < 2e-16 -       -           
Taylor Swift 0.00081   < 2e-16 0.00081 -           
The Weeknd   5.8e-10   < 2e-16 0.03206 1.4e-07     

P value adjustment method: holm 

From this pairwise Wilcoxon Rank Sum Test with a 95% confidence level, we reject the null hypothesis on pairs that have p-value less than 0.05. That is, there is a significant difference in the means of valence between the following pairs of artists:

  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and Taylor Swift
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and Taylor Swift
  • Bad Bunny and The Weeknd
  • Drake and Taylor Swift
  • Drake and The Weekend
  • Taylor Swift and The Weeknd

Thus, all pairs hold significant difference.

Bootstrap Test
c1 = quantile(boot.test(ts,bb,"Valence",1000),c(0.025,0.975))
c2 = quantile(boot.test(ts,bts,"Valence",1000),c(0.025,0.975))
c3 = quantile(boot.test(ts,dk,"Valence",1000),c(0.025,0.975))
c4 = quantile(boot.test(ts,wd,"Valence",1000),c(0.025,0.975))
c5 = quantile(boot.test(bts,bb,"Valence",1000),c(0.025,0.975))
c6 = quantile(boot.test(bts,dk,"Valence",1000),c(0.025,0.975))
c7 = quantile(boot.test(bts,wd,"Valence",1000),c(0.025,0.975))
c8 = quantile(boot.test(bb,dk,"Valence",1000),c(0.025,0.975))
c9 = quantile(boot.test(bb,wd,"Valence",1000),c(0.025,0.975))
c10 = quantile(boot.test(dk,wd,"Valence",1000),c(0.025,0.975))

result_matrix = matrix(c(
    c1, c2, c3, c4, c5, c6, c7, c8, c9, c10
), ncol = 2, byrow = TRUE)

result_matrix = data.frame(result_matrix)
include_zero = apply(result_matrix, 1, contains_zero)
result_matrix["Test"] = include_zero
colnames(result_matrix) = c("2.5%","97.5%","Test")
result_matrix
           2.5%       97.5%  Test
1  -0.132442197 -0.04278886 FALSE
2  -0.168747674 -0.11374561 FALSE
3   0.021700594  0.07877411 FALSE
4   0.048681669  0.11189445 FALSE
5   0.006958073  0.10077121 FALSE
6   0.159081283  0.22171720 FALSE
7   0.190242012  0.25549198 FALSE
8   0.091062874  0.18425703 FALSE
9   0.119720809  0.21491950 FALSE
10 -0.003522829  0.06684637  TRUE

From this Bootstrap Test with a 95% confidence level, we reject the null hypothesis on pairs where the confidence interval doesn’t include 0. That is, there is a significant difference in the means of valence between the following pairs of artists:

  • Taylor Swift and Bad Bunny
  • Taylor Swift and Drake
  • Taylor Swift and BTS
  • Taylor Swift and The Weeknd
  • BTS and Bad Bunny
  • BTS and Drake
  • BTS and The Weeknd
  • Bad Bunny and Drake
  • Bad Bunny and The Weeknd

Other pairs below have confidence intervals inclduing 0, which we fail to rejct the null hypotheis. That is there is no significant difference in the means of valence.

  • Drake and The Weeknd

Conclusion

Each of these hypothesis tests compare every pair of artists in each feature and state whether or not they hold a significant difference at a 95% confidence level. Since some of the results for the same feature differ from each other, we stated that more likely than not the result of the majority vote of all the hypotheses tests is the right one (eg: If two tests say there is a statistical difference between two artists and one does not, then we will chose that there seems to be a significant difference among those artists). While this statement is not the best conclusion, further statistical research would have to be done to determine whether or not these claims are fully correct. However, we believe there is generally a higher likelihood when two tests claim the same result.

Tempo seems to be the only feature where we fail to reject all the null hypotheses. Thus, we cannot state that there is a significant difference among any pair of artists. This is a very interesting finding since it shows that the five most streamed artists in Spotify from all around the world have a similar tempo. It is a safe assumption that the majority of users like songs with a tempo that falls within the range of these artists. This could be an interesting factor for Spotify to take into account while implementing their song recommendation algorithm. Furthermore, upcoming artists may want to take this information into consideration if they are looking for outreach strategies or popularity gain (even though we cannot state with confidence that this feature is deterministic in an artist’s popularity. We would have to do further statistical research to make this claim. However, it would be a “safe bet”).

All the other features: Acousticness, Danceability, Energy, Loudness, and Valence are different for a great number of pairs of artists. Thus, we can state confidently that most artists are statistically different in these features. However, based on our results. There seems to be more commonality of the values between all possible pairs from Drake, Bad Bunny, and The Weeknd than anybody else. This is expected since Drake and The Weeknd sing a similar genre but an interesting finding about Bad Bunny, since instead of Rap/pop he sings mainly reggaeton.

While this information can be useful for many upcoming artists and maybe for Spotify developers, we suggest that upcoming artists don’t try to plagiarize any of these artists and try to be them, but rather obtain inspiration from them and bring their own self to the table. While statistics are helpful to make inferences from any field, music is still an art and everyone should develop and display their own personality through it.

References

Spangler, Todd. n.d. “Spotify Launches Wrapped 2022: Bad Bunny, Taylor Swift Are Most-Streamed Artists of the Year.” Available at https://variety.com/2022/music/news/spotify-wrapped-bad-bunny-taylor-swift-1235444491/ (2023/26/11).
Spotify. n.d. “Get Tracks’ Audio Features.” Available at https://developer.spotify.com/documentation/web-api/reference/get-several-audio-features (2023/26/11).